6/23/2021
The Dictionary portion of the Pysistant has been completed. The issues, features, and more are listed below.
Problems and Solutions
I have added a Dictionary to Pysistant by using a combination of the Requests library,
Beautiful Soup 4 and Regex to scrape definitions from
WordNet.
The implementation of a Dictionary proved to be a simple task to program. The majority of the issues I ran into were of my own creation,
due to the amount of features Beautiful Soup 4 and finding a website that was simple enough to scrape but also had the definitions that I
wanted.
The amount of features that Beautiful Soup 4 supports had me scraping the documentation for how I should tackle scraping websites for information.
After a few days of reading the documentation and watching videos on the library itself I learned about the find_all method and how to use a loop
in order to return the information that I wanted to pull from the site. But I still needed a website in order to put this theory into practice.
Finding a website that was simple enough to scrape but also had the definitions that I needed proved to be a difficult and unique challenge. I
immediately looked towards larger sites like Merriam-Webster and Dictionary.com but their layouts proved to be either too complex or bloated to
scrape efficiently. I looked further and eventually found WordNet. With it's simple design and even simpler HTML I could finally get to work on
scraping definitions from the site.
And with a bit of Regex I was able to scrape WordNet for definitions to words based on user input. The majority of the challenge for this project
proved to be outside the IDE and led to a resource who's minimalist design triumphed over their more convoluted counterparts.
Follow the development of Pysistant here or on Github
This blog post is tagged: Pysistant.